Journal of Computational Chemistry
○ Wiley
Preprints posted in the last 30 days, ranked by how well they match Journal of Computational Chemistry's content profile, based on 11 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Otten, L.; Leung, J. M. G.; Chong, L.; Zuckerman, D. M.
Show abstract
Recently, a number of tools have been released that generate ensembles of protein structures based on artificial intelligence (AI) approaches. Although ensembles generated by the tools differ significantly, we demonstrate a computational path to harmonizing the various outputs under a stationary condition using two complementary physics-based approaches. In the first stage, the AI ensemble is used to seed a weighted ensemble (WE) simulation, promoting relaxation toward the steady state. In the second stage, trajectory segments generated by WE are reweighted to steady state using the recently developed RiteWeight (RW) algorithm. We applied this approach to generate an atomically-detailed equilibrium ensemble of unliganded adenylate kinase conformations, starting from ensembles produced by three AI tools: AFSample2, ESMFlow-PDB (trained from PDB structures), and ESMFlow-MD (trained from molecular dynamics simulation data). Dramatic differences in the AI-generated ensembles are largely erased during the WE-RW process, yielding a consistent description of the equilibrium ensemble for a given force field.
Yamauchi, M.; Murata, Y.; Niina, T.; Takada, S.
Show abstract
There is a growing demand for molecular dynamics simulations to explore longer timescale behavior of giant protein-DNA complexes such as chromatin. To address this need, we extended OpenCafeMol, a GPU-accelerated residue-level coarse-grained molecular dynamics simulator originally developed for proteins and lipids, to support 3SPN.2 and 3SPN.2C DNA models. We also implemented a hydrogen-bond-type many-body potential to model DNA-protein interactions more accurately. To further improve computational efficiency, we introduced a localized scheme for calculating base-pairing and cross-stacking interactions. Benchmark tests show that OpenCafeMol on a single GPU achieves up to 200-fold speed-up for DNA-only systems and up to 100-fold speed-up for DNA-protein complexes compared to CPU-based simulations. To demonstrate the capability of our implementation for long-timescale biological processes, we simulated an archaeal SMC-ScpA complex undergoing DNA translocation via segment capture (a proposed mechanism for DNA loop extrusion) in the presence of a DNA-bound obstacle. We observed continuous captured-loop growth accompanied by obstacle bypass within the segment capture framework.
Ji, J.; Lyman, E.
Show abstract
With the advance of hardware and software for molecular dynamics simulation it has become routine to obtain trajectories that are tens of microseconds in duration for all kinds of protein machinery. This shifts the burden of work onto analysis of the simulation data and opens opportunities for more rigorous and reproducible observations on mechanism. Toward this end we developed an investigator-blind analysis pipeline which operates on featurized simulation data, performs unsupervised clustering, and then identifies which input features are most discriminatory of cluster identity. Application of this pipeline to a large set of G-protein coupled receptor simulation data shows that it identifies several well-known microswitches. Inspection of these structural elements reveals changes in conformation that are known to accompany functional transitions of the receptor. In addition to these known structural elements the analysis also identifies two possibly new structural motifs: the kink in transmembrane helix 2, and a coupled "piston-like" motion of TM2 and TM3.
Teshirogi, Y.; Terada, T.
Show abstract
Molecular dynamics (MD) simulations are a powerful tool for investigating biomolecular dynamics underlying biological functions. However, the accessible spatiotemporal scales of conventional all-atom simulations remain limited by high computational costs. Coarse-graining reduces these costs by decreasing the number of interaction sites and enabling longer timesteps. In extreme cases, proteins are represented as single spherical particles; while such approximations facilitate cellular-scale simulations, they often sacrifice essential structural information, such as molecular shape and interaction anisotropy. Here, we present CGRig, a rigid-body protein model with residue-level interaction sites designed for long-time, large-scale simulations. In CGRig, each protein is treated as a single rigid-body embedding residue-level interaction sites. Its translational and rotational motions are described by the overdamped Langevin equation incorporating a shape-dependent friction matrix. Intermolecular interactions are calculated using G[o]-like native contact potentials, Debye-Huckel electrostatics, and volume exclusion. We validated that CGRig accurately reproduces the translational and rotational diffusion coefficients expected from the friction matrix for an isolated protein. For dimeric systems, the model successfully maintained native complex structures. Furthermore, two initially separated proteins converged into the correct complex with an association rate consistent with all-atom simulations. Notably, CGRig achieved a simulation performance exceeding 17 s/day for a 1,024-molecule system. These results demonstrate that CGRig provides an efficient framework for simulating protein assembly while retaining residue-level interaction specificity, making it a valuable tool for investigating large-scale biomolecular self-assembly.
Chattaraj, A.; Kanovich, D. S.; Ranganathan, S.; Shakhnovich, E. I.
Show abstract
Phase separated condensates are recognized as a ubiquitous mechanism of spatial organization in cell biology. Biophysical modeling of condensates provides critical insights into the dynamics and functions of these subcellular structures that are difficult to extract via experiments. Here we present an efficient computational pipeline, CASPULE (Condensate Analysis of Sticker Spacer Polymers Using the LAMMPS Engine), to simulate and analyze the biological condensates made of sticker-spacer polymers. CASPULE implements a unique force field that combines traditional Langevin dynamics with a "detailed balance proof" protocol for single-valent bond formation between stickers. This framework allows us to study the non-trivial biophysics that emerge out of the single-valent sticker interactions coupled with the effect of separation in energetic contribution by stickers and spacers. We provide detailed documentation on how to setup the simulation environment, perform simulations and analyze the results. Through case studies, we highlight the utility and efficacy of our pipeline. Importantly, we provide statistical parameters to characterize the cluster size distribution often observed in biological systems. We envision this tool to be broadly useful in decoding the interplay of kinetics and thermodynamics underlying the formation and function of biological condensates.
Wiebeler, C.; Falkner, S.; Schwierz, N.
Show abstract
Accurate ion force fields are essential for molecular dynamics simulations of biomolecular systems, particularly in combination with modern water models such as OPC. While OPC water improves the description of bulk water and biomolecules, the transferability of existing ion force fields to this model remains an open question. Here, we systematically assess the transferability of monovalent and divalent ion force field parameters (Li+, Na+, K+, Cs+, Mg2+,Ca2+, Sr2+, Ba2+, Cl- and Br-) to OPC water by comparing single-ion and ion-pairing properties with experimental data. Our analysis reveals that no single literature parameter set provides accurate results for all ions when directly transferred to OPC water. We hence introduce the MS/G-LB(OPC) force field, which combines Mamatkulov-Schwierz-Grotz cation parameters with Loche-Bonthuis anion parameters. MS/G-LB(OPC) reproduces hydration free energies, first-shell structural properties and activity derivatives at low salt concentrations. Our results demonstrate that transferring ion parameters to OPC can lead to significant and ion-specific deviations from experimental data, making careful validation essential. At the same time, the systematic transfer and combination of ion parameters from existing force fields can provide a practical and computationally efficient alternative to full reparameterization. MS/G-LB(OPC) is available at https://git.rz.uni-augsburg.de/cbio-gitpub/opc-ion-force-fields.
Jimenez Garcia, J. C.; Lopez-Gallego, F.; Lopez, X.; De Sancho, D.
Show abstract
The rational design of biomolecule immobilization strategies requires molecular-level understanding of how surface properties, tethering geometry, and structural dynamics jointly influence stability and function. Recently, coarse-grained molecular dynamics simulations based on the Martini force field have emerged as an efficient framework for studying enzyme-surface interactions. However, the reproducible construction of immobilized systems with controlled orientations remains technically challenging, usually involving multiple computational tools. Here we present MartiniSurf, an open-source command-line framework for the preparation of protein and DNA systems immobilized on solid supports within the Martini paradigm. MartiniSurf integrates automated structure retrieval and cleaning, coarse graining via tools from the Martini force field software ecosystem, customizable surface generation, and biomolecule orientation based on user-defined anchoring residues, producing complete GROMACS-ready simulation systems. The framework supports both implicit restraint-based anchoring and explicit linker-mediated immobilization, including surfaces functionalized with user-defined ligands or linker-like moieties, enabling representation of mono- and multivalent attachment geometries at different modeling resolutions. Structure-based G[o]Martini potentials can be incorporated for proteins, while DNA systems are modeled using Martini 2. Optional substrate insertion, pre-coarse-grained complex handling, and automated solvation and ionization further extend system flexibility. By integrating these components into a unified workflow, MartiniSurf enables systematic and high-throughput in silico exploration of surface-tethered biomolecules and provides a robust computational platform for rational immobilization studies. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=146 SRC="FIGDIR/small/714767v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@bc1ac4org.highwire.dtl.DTLVardef@1813b43org.highwire.dtl.DTLVardef@159b19borg.highwire.dtl.DTLVardef@19b60d6_HPS_FORMAT_FIGEXP M_FIG C_FIG
Cheng, K.; Liu, Y.; Nie, Z.; Lin, M.; Hou, Y.; Tao, Y.; Liu, C.; Chen, J.; Mao, Y.; Tian, Y.
Show abstract
Understanding the structural dynamics of biomolecules is crucial for uncovering biological functions. As molecular dynamics (MD) simulation data becomes more available, deep generative models have been developed to synthesize realistic MD trajectories. However, existing methods produce fixed-length trajectories by jointly denoising high-dimensional spatiotemporal representations, which conflicts with MDs frame-by-frame integration process and fails to capture time-dependent conformational diversity. Inspired by MDs sequential nature, we introduce a new probabilistic autoregressive (ProAR) framework for trajectory generation. ProAR uses a dual-network system that models each frame as a multivariate Gaussian distribution and employs an anti-drifting sampling strategy to reduce cumulative errors. This approach captures conformational uncertainty and time-coupled structural changes while allowing flexible generation of trajectories of arbitrary length. Experiments on ATLAS, a large-scale protein MD dataset, demonstrate that for long trajectory generation, our model achieves a 7.5% reduction in reconstruction RMSE and an average 25.8% improvement in conformation change accuracy compared to previous state-of-the-art methods. For conformation sampling task, it performs comparably to specialized time-independent models, providing a flexible and dependable alternative to standard MD simulations.
Woody Santos, J. B.; Chen, L.; Miranda Quintana, R. A.
Show abstract
We present mdBIRCH, an online clustering method that adapts the BIRCH CF-tree to molecular dynamics (MD) data by using a merge test calibrated directly to RMSD. Each arriving frame is routed to the nearest centroid and added only if the post-merge radius computed from the cluster feature remains within a user-supplied threshold. This keeps the average deviation to each cluster centroid bounded as the cluster grows and preserves a simple interpretation of resolution in physical units. We evaluate mdBIRCH on a {beta}-heptapeptide and the HP35 system. We propose two protocols to make the threshold selection easier: (a) RMSD-anchored runs that use controlled structural edits to define interpretable operating points and (b) blind sweep that tracks how cluster count, occupancy, and coverage change with the threshold. In both systems, increasing the threshold reduces the number of clusters, concentrates coverage in high-occupancy states, and broadens within-cluster RMSD distributions. Furthermore, because decisions rely only on cluster summaries, mdBIRCH completely avoids the need for pairwise distance matrices, scales near-linearly with the number of frames on standard hardware, and naturally supports incremental operation. The method offers a practical combination of speed and interpretability for large-scale trajectory analysis.
Perez, L.; Iradukunda, M.; Krizanc, D.; Thayer, K.; Weir, M. P.
Show abstract
Developing approaches to link structure and function is an ongoing challenge in computational and structural biology. Using a systems-level framework, we present here an analysis pipeline in a Python package, mdsa-tools, that constructs network representations of structures in a time series of trajectory frames from molecular dynamics (MD) simulations. Here, we demonstrate its use on a ribosomal subsystem. The subsystem is centered on the CAR interaction surface, a "brake pad" adjacent to the aminoacyl (A-site) decoding center that tunes protein translation rates. We leverage unsupervised learning algorithms to explore the conformational landscape of behaviors visited by two versions of the subsystem (brake-on and brake-off) that differ at the codon 3 adjacent to the A-site codon. Our network representations of MD frames embody H-bond interactions between all pairwise combinations of residues in the system. By utilizing per-frame vector representations of network edges, we can apply standard clustering and dimensionality reduction methods to explore behavioral differences between the brake-on and brake-off versions of the system. K-means clustering of frame vectors revealed a striking separation of the two system versions, consistent with principal components analysis (PCA) embeddings and Uniform Manifold Approximation and Projection (UMAP) embeddings. Dissection of K-means centroids and PCA loadings highlighted H-bond interactions between residue pairs in the ribosomes peptidyl site (P site), suggesting potential allosteric signaling across the subsystem. Author summaryWith the impressive development of computational algorithms to successfully simulate the dynamics of biological molecules over time, the exploration and incorporation of systems modes of analysis is a natural next step to begin to understand the molecular dynamics behaviors that emerge from these experiments. Following the approaches of classical molecular genetics, we used a "computational genetics" paradigm where we introduced changes (mutations) in potentially important residues, changing their identities or modifying their chemical properties, and asked how the dynamic system responded to these changes, viewing the simulations as a series of movie frames of the dynamic structure over time. Starting with network representations of each frames structure, where the nodes are residues, and the edges denote H-bond interactions between the residues, we used several unsupervised machine learning algorithms to uncover behavioral changes in the different mutated versions of the system. Applied to our ribosome neighborhood, this revealed unexpected changes in behavior at the ribosome peptidyl site (P site) in response to mutating mRNA residues on the other side of the aminoacyl site (A site) codon, suggesting long-range allosteric interactions across the neighborhood.
Cui, T.; Wang, Z.; Wang, T.
Show abstract
AI-based molecular dynamics simulation brings ab initio calculations to biomolecules in an efficient way, in which the machine learning force field (MLFF) locates at the central position by accurately predicting the molecular energies and forces. Most existing MLFFs assume localized interatomic interactions, limiting their ability to accurately model non-local interactions, which are crucial in biomolecular dynamics. In this study, we introduce ViSNet-PIMA, which efficiently learns non-local interactions by physics-informed multipole aggregator (PIMA) and accurately encodes molecular geometric information. ViSNet-PIMA outperforms all state-of-the-art MLFFs for energy and force predictions of different kinds of biomolecules and various conformations on MD22 and AIMD-Chig datasets, while adapting the PIMA blocks into other MLFFs further achieves 55.1% performance gains, demonstrating the superiority of ViSNet-PIMA and the universality of the model design. Furthermore, we propose AI2BMD-PIMA to incorporate ViSNet-PIMA into AI2BMD simulation program by introducing "Transfer Learning-Pretraining-Finetuning" scheme and replacing molecular mechanics-based non-local calculations among protein fragments with ViSNet-PIMA, which reduces AI2BMDs energy and force calculation errors by more than 50% for different protein conformations and protein folding and unfolding processes. ViSNet-PIMA advances ab initio calculation for the entire biomolecules, amplifying the application values of AI-based molecular dynamics simulations and property calculations in biochemical research.
Jang, L. S.-e.; Cha, S.; Steinegger, M.
Show abstract
Terminal-based workflows are central to large-scale structural biology, particularly in high-performance computing (HPC) environments and SSH sessions. Yet no existing tool enables real-time, interactive visualization of protein backbone structures directly within a text-only terminal. To address this gap, we present StrucTTY, a fully interactive, terminal-native protein structure viewer. StrucTTY is a single self-contained executable that loads mulitple PDB and mmCIF files, normalizes three-dimensional coordinates, and renders protein structures as ASCII graphics. Users can rotate, translate, and zoom in on structures, adjust visualization modes, inspect chain-level features and view secondary structure assignments. The tool supports simultaneous visualization of up to nine protein structures and can directly display structural alignments using Foldseeks output, enabling rapid comparative analysis in headless environments. The source code is available at https://github.com/steineggerlab/StrucTTY. O_TEXTBOXKey MessagesO_LIReal-time, interactive protein structure visualization directly within text-only terminals C_LIO_LIASCII-based, depth-aware rendering of PDB and mmCIF backbone structures C_LIO_LIMulti-structure comparison with direct application of Foldseek alignment transformations C_LIO_LIDesigned for headless workflows on remote servers and HPC systems C_LI C_TEXTBOX
Secker, C.; Secker, P.; Yergoez, F.; Celik, M. O.; Chewle, S.; Phuong Nga Le, M.; Masoud, M.; Christgau, S.; Weber, M.; Gorgulla, C.; Nigam, A.; Pollice, R.; Schuette, C.; Fackeldey, K.
Show abstract
The identification of suitable lead molecules in the vast chemical space is a critical and challenging task in drug discovery campaigns. Recently, it has been demonstrated that large-scale virtual screening provides a powerful approach to accelerate the identification of novel drug candidates by screening ever increasing virtual ligand libraries, which have reached magnitudes of > 1020 compounds. However, this desirable increase in potentially bioactive molecules poses a new challenge as enumerating and virtually screening such huge compound libraries is computationally prohibitive. Consequently, advanced approaches to navigate ultra-large chemical spaces and to identify suitable candidate molecules therein are urgently needed. Here, we present an evolutionary algorithm framework using molecular generative AI, reaction-based substructure searching, and iterative model fine-tuning for a targeted and efficient exploration of chemical fragment spaces. Combining this approach with large-scale virtual screening we are able to identify target-specific candidate molecules within the commercially available Enamine REAL Space ([~]1015). We demonstrate the applicability of the approach by successfully identifying and biochemically validating pH-specific ligands of the {micro}-opioid receptor. Our results demonstrate that integrating generative AI with evolutionary algorithms provides a promising route to explore ultra-large chemical spaces for the discovery of novel, synthetically accessible lead molecules.
Deyawe Kongmeneck, A.; San Ramon, G.; Delisle, B.; Kekenes-Huskey, P.
Show abstract
1Long QT syndrome Type 2 (LQT2) is a genetic disorder caused by missense mutations in the KCNH2 gene that encodes the potassium channel KV11.1. Previous studies have shown that most KV11.1 missense mutations with loss-of-function phenotypes result from impaired trafficking from the endoplasmic reticulum to the plasma membrane. To investigate the molecular basis of these defects, we used molecular dynamics simulations to analyze two sets of disease-associated missense mutations: those that suppress and those that maintain normal channel trafficking. We focused initially on the conformational and dynamics differences between wild-type and several mutants of KV11.1 via molecular dynamics simulations when two K+ were placed in the selectivity filter (SF). Our study reveals that missense mutations in the S4 helix allosterically disrupt the selectivity filter, a critical determinant for proper channel trafficking. Trafficking-competent variants largely retained a wild-type selectivity filter structure, whereas trafficking-deficient mutants exhibited pronounced structural perturbations in this region. These findings suggest that certain LQT2-associated missense mutations in KCNH2 impair channel trafficking by compromising the structural integrity of the selectivity filter. We additionally found that second-site variants Y652C in the drug binding vestibule can correct structural defects associated with some mistrafficking variants.
Zou, R.; Nag, S.; Sousa, V.; Moren, A. F.; Toth, M.; Meynaq, Y. K.; Pedergnana, E.; Valade, A.; Mercier, J.; Vermeiren, C.; Motte, P.; Zhang, X.; Svenningsson, P.; Halldin, C.; Varrone, A.; Agren, H.
Show abstract
Synaptic vesicle glycoproteins 2 (SV2) are integral membrane proteins essential for neurotransmitter release and are implicated in neurological disorders including epilepsy and Parkinsons disease. In the attempt to develop a ligand selective for SV2C, and in collaboration with UCB, UCB-F was identified as a potential candidate. However, the affinity of UCB-F to SV2C was found to be temperature dependent, decreasing by about 10-fold from +4 to 37 degrees. UCB1A was subsequently identified as SV2C ligand displaying in vitro a 100-fold selectivity for SV2C compared with SV2A. In this study we investigated whether the binding of UCB-1A to SV2A and SV2C was affected by the temperature. A combination of experimental binding assay data and molecular dynamics (MD) simulations were used. The binding studies revealed that UCB1A affinity for SV2A decreased significantly at 37 {degrees}C compared with 4 {degrees}C, whereas binding to SV2C remained largely unchanged. MD simulations reproduced these observations, namely that ligand RMSD values at 310 K showed that UCB1A binding fluctuated markedly in the SV2A complex, with many trajectories exceeding the 3.0 [A] stability cutoff, whereas UCB1A remained relatively well-anchored in SV2C under the same conditions. Structural analysis showed that, while UCB1A adopts a conserved binding pose across all isoforms stabilized by {pi}- {pi} stacking and a hydrogen bond with Asp, SV2C possesses a unique stabilizing feature. In SV2C, Tyr298 is less exposed to the solvent and engages in a persistent hydrogen bond with Asparagine, a structural feature that reinforces pocket stability and limits temperature-induced destabilization. This interaction is absent in SV2A, consistent with its greater temperature sensitivity. Together, these findings provide a mechanistic explanation for the experimentally observed temperature independence of UCB1A binding to SV2C. More broadly, the results highlight the importance of incorporating physiologically relevant temperatures into SV2 ligand evaluation and demonstrate how combining experiments with simulations can uncover isoform-specific mechanisms of ligand recognition and stability.
Mlynsky, V.; Kuehrova, P.; Bussi, G.; Otyepka, M.; Sponer, J.; Banas, P.
Show abstract
Understanding RNA structural dynamics is essential for elucidating its biological functions, and molecular dynamics (MD) simulations provide an important atomistic complement to experimental approaches. However, the predictive power of MD is fundamentally limited by the accuracy of the underlying empirical Force Fields (FFs), particularly in capturing the delicate balance of non-bonded interactions. Here, we present a systematic reparameterization strategy that replaces the external gHBfix19 hydrogen-bond (H-bond) correction potential with an equivalent set of NBfix Lennard-Jones modifications within a state-of-the-art RNA FF. Using a quantitatively converged temperature replica-exchange MD ensemble of the GAGA tetraloop, we employed a reweighting-based optimization protocol to derive NBfix parameters that reproduce the thermodynamic effects of the original gHBfix19 terms. Sequential optimization of individual gHBfix19 components proved essential to ensure stable and transferable parameter refinement. The resulting fully reformulated NBfix-based variant, termed OL3CP-NBfix19, was validated on a representative set of RNA motifs, including tetranucleotides, A-form duplexes, and tetraloops. Across all tested systems, its performance is comparable to that of the reference gHBfix19 FF. By embedding the H-bond corrections directly into the standard non-bonded framework, the NBfix formulation eliminates external biasing potentials, simplifies practical deployment, and reduces computational overhead. Beyond this specific reparameterization, our results demonstrate a practical workflow for translating targeted H-bond corrections into native FF terms for efficient biomolecular simulations.
Nair, V.; Niknam Hamidabad, M.; Erol, D.; Mansbach, R.
Show abstract
There has been a surge in antibiotic resistance in recent years, making traditional antibiotics less effective against key pathogens. RNA has recently emerged as a potential target for antibiotics due to its involvement in crucial microbial functions. It is possible to expand the range of therapeutic targets by using RNA-based therapies, but it remains necessary to improve the molecular-level understanding of interactions between RNA and known and potential binders. The SAM-I riboswitch, which controls the transcriptional termination of gene expression involved in sulfur metabolism in most bacteria, is an excellent ligand target. Thus, understanding its behavior with and without ligand complexes would be very helpful for drug design applications. In this manuscript, we studied the interactions between the SAM-I riboswitch and its natural ligand, SAM, which controls riboswitch function, and compared those interactions to its interactions with the very similar small molecular SAH, which does not control riboswitch function, and to its interactions with a potential binder JS4, identified via virtual screening. From our simulations, we gain a deeper understanding of small molecule interactions with the SAM-I riboswitch. The results reveal how differently the small molecules (SAM, SAH and JS4) bind to and potentially induce conformational changes in the riboswitch. Our findings offer valuable insight into the molecular mechanisms underlying riboswitch RNA-ligand interactions for the design of more effective RNA-targeting therapeutics.
Broster, J. H.; Popovic, B.; Kondinskaia, D.; Deane, C. M.; Imrie, F.
Show abstract
Molecular docking aims to predict the binding conformation of a small molecule to its protein target. Recent work has proposed diffusion models for this task, from rigid-body docking that diffuses over ligand degrees of freedom to co-folding approaches that jointly generate protein structure and ligand pose. However, diffusion-based docking models have been shown to frequently produce physically implausible poses and fail to consistently recover key protein-ligand interactions. To address this, we introduce a reinforcement learning framework for training diffusion-based docking models directly on non-differentiable objectives. Fine-tuning DiffDock-Pocket for physical validity with our approach substantially increases the number of generated poses that are physically valid and interaction-preserving, with no increase in inference-time compute. Importantly, this comes without sacrificing structural accuracy; in fact, our approach increases the proportion of structures with near-native poses. These effects are most pronounced for protein targets that are dissimilar to the training data. Our fine-tuned DiffDock-Pocket model outperforms both classical docking algorithms and machine learning-based approaches on the PoseBusters set. Our results demonstrate that reinforcement learning can teach diffusion-based docking models to better respect physical constraints and recover key interactions, without the requirement to rely on inference-time corrections.
Pedraza, E.; Tejedor, A. R.; S. Zorita, A.; Collepardo-Guevara, R.; De Sancho, D.; Llombart, P.; Rene Espinosa, J.
Show abstract
Biomolecular condensates formed by complex coacervation of highly charged proteins provide a powerful framework to understand how microscopic interactions give rise to macroscopic material properties. Atomistic molecular dynamics simulations provide detailed insights but remain limited in accesing the spatio-temporal scales relevant for condensate behavior. Here, we use the residue-level coarse-grained Mpipi-Recharged model to investigate condensates formed by ProT and positively charged partners, including histone H1, protamine, poly-lysine, and poly-arginine. Material properties, in this context, provide a stringent experimental benchamark for coarse-grained models. Our model reproduces salt-dependent phase behavior, protein binding affinities, and sequence-specific stability trends in agreement with in vitro experiments, despite the fact that material properties were not included in the model parametrization. We then establish a direct link between protein dynamics and macroscopic material properties by quantifying monomeric diffusion, conformational reconfiguration, and translational mobility within the dense phase, and relating these to condensate viscosity. By comparing dynamics across dense and dilute phases, we uncover a pronounced length scale-dependent behavior. While residue-level binding and unbinding events remain equally fast in both phases, protein reconfiguration time and self-diffusion are significantly slowed down within the condensates. This decoupling reveals how fast intermolecular interactions coexist with slow mesoscale condensate dynamics depending on the molecular length scale. Together, our results establish a predictive framework that links encoded sequence intermolecular forces and multiscale dynamics to the emergent material properties of complex biomolecular condensates.
Poelmans, R.; Bruncsics, B.; Arany, A.; Van Eynde, W.; Shemy, A.; Moreau, Y.; Voet, A. R.
Show abstract
Knowledge-based potentials (KBPs) have long been used to score protein-ligand interactions, yet existing formulations remain isotropic, capturing only distance dependencies and neglecting the directional preferences that govern molecular recognition. Here, we introduce Direction-Enhanced Scoring POTentials (DESPOT), an anisotropic knowledge-based framework that unifies pose scoring and binding-site characterisation within a single probabilistic model. The new probabilistic formulation used in DESPOT naturally supports directional modelling through atom type-specific local reference frames and symmetry-aware geometric discretisation. It also supports steric exclusion, encoded as a dedicated void state that explicitly captures the probability that a spatial bin remains unoccupied. The anisotropic interaction profiles learned by DESPOT reveal systematic directional preferences for interactions such as hydrogen bonds, aromatic interactions, and halogen bonds, that extend beyond idealised geometric models. Evaluation on the CASF-2016 benchmark shows that DESPOT sub-stantially outperforms isotropic KBPs in all pose-discrimination and virtual screening tasks (p << 0.0001 for all enrichment factors), with the largest gains arising from its ability to penalise geometrically implausible poses. Constrained energy minimisation of training structures proves strongly beneficial for the derivation of KBPs, while our train-test leakage analysis reveals that overfitting is an underestimated and understudied issue for KBPs. DESPOT provides a data-driven framework for direction-aware modelling of protein-ligand interactions, with applications in pose scoring, binding-site characterisation, and structure-based design.